Lxml removes spaces and line breaks in

For me print (tostring(e, encoding=str)) returns print (tostring(e, encoding=str)) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/dist-packages/lxml/html/__init__. Py", line 1493, in tostring encoding=encoding) File "lxml.etree. Pyx", line 2836, in lxml.etree.

Tostring (src/lxml/lxml.etree. C:53416) TypeError: descriptor 'upper' of 'str' object needs an argument I cannot speak to the descrepencey, but I do suggest setting the argument pretty_print to true etree. Tostring(e, pretty_print=True) '\n \n \n \n \n \n \n \n \n\n you will need to import etree from lxml import etree when outputted to an outfile the spaces and newlines will be perserved.

Also with print print(etree. Tostring(e, pretty_print=True)) >> newString = re. Sub('\n ', '', etree.

Tostring(e,encoding=unicode,pretty_print=True), count=1) >>> print(newString).

For me print (tostring(e, encoding=str)) returns >>> print (tostring(e, encoding=str)) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/dist-packages/lxml/html/__init__. Py", line 1493, in tostring encoding=encoding) File "lxml.etree. Pyx", line 2836, in lxml.etree.

Tostring (src/lxml/lxml.etree. C:53416) TypeError: descriptor 'upper' of 'str' object needs an argument I cannot speak to the descrepencey, but I do suggest setting the argument pretty_print to true >>> etree. Tostring(e, pretty_print=True) '\n \n \n \n \n \n \n \n \n\n' you will need to import etree from lxml import etree when outputted to an outfile the spaces and newlines will be perserved.

Also with print >>> print(etree. Tostring(e, pretty_print=True)) I am sure you have checked out the API, but incase you haven't here is information on tostring(). It is also safe to assume you have seen the tutorial on the lxml website.

I would love to see some more 'good' resources. I am new to lxml myself and anything new and good to read would be welcomed. Updated you said you wouldconsider sed if you could not find a good python solution.

This should accomplish it with sed sed -i '1,2d;' input. Html; sed -i '1 i\' input. Html this is running two sed procedures.

The first deletes the first 2 lines. The second inserts on the first line. UPDATE #2 I should have thought about this more.

You can do this with python >>> import re >>> newString = re. Sub('\n ', '', etree. Tostring(e,encoding=unicode,pretty_print=True), count=1) >>> print(newString).

I use python 3, with python 2 use unicode instead. Also I guess this exception is because of old version of lxml. It seems you install lxml from package manager and not with easy_install.

– Taha Jahangir Jun 24 at 15:28 pretty_print rearrange elements! I want to preserve spaces, but not prettify. In fact I want get input from user and not change it with from_string and then to_string.

– Taha Jahangir Jun 24 at 15:28 ah, python3 might explain the small disagreements in syntax. I still recommend pretty_print=True, from my output it does what your question asked preserves spaces and line breaks. Perhaps update your question using pretty_print=True, its output, and contrast it against your desired output.

Because, I am not quite sure what your asking now. NOTE, do this using etree. Tostring – matchew Jun 24 at 15:32 See edited version of question.

I expect output to be exactly same as input. Pretty_print=True adds a \n between and . – Taha Jahangir Jun 24 at 15:46 1 That will not actually preserve formatting, it will just recreate something that looks like what this example has... – Lennart Regebro Jun 247 at 5:45.

Finally, I used html5lib to parse html and generate lxml like tree with it. Parser = html5lib. HTMLParser(tree=html5lib.

GetTreeBuilder("lxml"), namespaceHTMLElements=False).

When outputted to an outfile the spaces and newlines will be perserved. I am sure you have checked out the API, but incase you haven't here is information on tostring(). It is also safe to assume you have seen the tutorial on the lxml website.

I would love to see some more 'good' resources. I am new to lxml myself and anything new and good to read would be welcomed. You said you wouldconsider sed if you could not find a good python solution.

Sed -i '1,2d;' input. Html; sed -i '1 i\' input. This is running two sed procedures.

The first deletes the first 2 lines. The second inserts on the first line. I should have thought about this more.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.